Accurate classification of optical communication signal quality is crucial for maintaining the reliability and performance of high-speed communication networks. While existing supervised learning approaches achieve high accuracy on laboratory-collected datasets, they often face difficulties in generalizing to real-world conditions due to the lack of variability and noise in controlled experimental data. In this study, we propose a targeted data augmentation framework designed to improve the robustness and generalization of binary optical signal quality classifiers. Using the OptiCom Signal Quality Dataset, we systematically inject controlled perturbations into the training data including label boundary flipping, Gaussian noise addition, and missing-value simulation. To further approximate real-world deployment scenarios, the test set is subjected to additional distribution shifts, including feature drift and scaling. Experiments are conducted under 5-fold cross-validation to evaluate the individual and combined impacts of augmentation strategies. Results show that the optimal augmentation setting (flip_rate = 0.10, noise_level = 0.50, missing_rate = 0.20) substantially improve robustness to unseen distributions, raising accuracy from 0.863 to 0.950, precision from 0.384 to 0.632, F1 from 0.551 to 0.771, and ROC-AUC from 0.926 to 0.999 compared to model without augmentation. Our research provides an example for balancing data augmentation intensity to optimize generalization without over-compromising accuracy on clean data.
Loading....